An Algorithm Rapidly Segmenting Chinese Sentences into Individual Words
نویسندگان
چکیده
منابع مشابه
Segmenting DNA sequence into `words'
[Abstract] This paper presents a novel method to segment/decode DNA sequences based on statistical language model. Firstly, we find the length of most DNA “words” is 12 to 15 bps by analyzing the genomes of 12 model species. Then we apply the unsupervised approach to build the DNA vocabulary and design DNA sequence segmentation method. We also find different genomes is likely to use the similar...
متن کاملSegmenting unrestricted Chinese text into prosodic words instead of lexical words
This paper stresses the importance of converting a string of lexical words to that of prosodic words in TTS systems by presenting the surface differences and perceptual differences between them. A statistical rule based method and a CART based method are proposed as solutions. Though ComplicatedSet based CART method performs the best, the achievement is obtained at the cost of heavy computation...
متن کاملSegmenting Chinese Unknown Words by Heuristic Method
Chinese text segmentation is important in Chinese text indexing. Due to the lack of word delimiters in Chinese text, Chinese text segmentation is more difficult than English text segmentation. Besides, the segmentation ambiguities and the occurrences of out-of-vocabulary words (i.e. unknown words) are the major challenges in Chinese segmentation. Many research works dealing with the problem of ...
متن کاملSegmenting Sentences into Linky Strings Using D-bigram Statistics
It is obvious that segmentation takes an important role in natural language processing(NLP), especially for the languages whose sentences are not easily separated into morphemes. In this s tudy we propose a method of segmenting a sentence. The system described in this paper does not use any grammatical information or knowledge in processing. Instead, it uses statistical information drawn from n...
متن کاملAn Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes
This paper describes an unsupervised algorithm for segmenting categorical time series into episodes. The Voting-Experts algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two “expert methods” decide where in the window boundaries should be drawn. The algorithm successfully segments text into words in four language...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: MATEC Web of Conferences
سال: 2019
ISSN: 2261-236X
DOI: 10.1051/matecconf/201926704001